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Abstract 

For some variants of regression models, including partial, measurement error 
or error-in-variables, latent effects, semi-parametric and otherwise corrupted lin¬ 
ear models, the classical parametric tests generally do not perform well. Various 
modifications and generalizations considered extensively in the literature rests on 
stringent regularity assumptions which are not likely to be tenable in many appli¬ 
cations. However, in such non-standard cases, rank based tests can be adapted 
better, and further, incorporation of rank analysis of covariance tools enhance 
their power-efficiency. Numerical studies and a real data illustration show the 
superiority of rank based inference in such corrupted linear models. 

Key words: Latent variable; Measurement error; Mixed regression model; Partially 
linear model; Rank analysis of covariance; Rank analysis of variance; Rank test of 
linear hypothesis 

1 Introduction 

Classical linear regression models induce some stringent additivity, linearity, homoscedas- 
ticity and normality assumptions which may not be tenable in many applications giving 
rise to the so called corrupted linear models where one or more of these assumptions 
may not be tenable. In simple nonparametric linear models, the normality assumption 
has been dispensed with in favor of a more general class of continuous distributions. 
Yet, in more contemporary applications in biomedical, clinical and genomics studies, 
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the very assumption of linearity may be questionable. Sans such a linear setup, the 
performance of rank based testing procedures may be generally far better than their 
strict parametric counterparts. Our contemplated corrupted linear models relate to this 
scenario where the basic linearity assumption is vitiated by possible error-in-variables, 
measurement errors, possible latent effects, and the so called random effects and mixed 
effects; even partial linear models and some semi-parametric models belong to this 
contemplated class. For example. Fuller (1987) has detailed a large class of models 
which can be classihed as measurement error or error-in-variable models; some genuine 
identihability issues may crop-up in the use of standard parametric inference. Another 
variation is the usual regression models with stochastic predictors whose possible non¬ 
normal distribution can create stumbling blocks to the adaption of standard parametric 
methods. In addition such stochastic predictors may not be linearly related with the 
primary response variable. The impact of such nonregular setups on statistical tests has 
been considered by Ghosh and Sen (1971), followed by more general treatise by others. 
In a semi-parametric setup, partial linear models were introduced mostly during the 
1980s and 1990s (Heckman (1986), Speckman (1988), Khuri, Mathew and Sinha (1988), 
Chen (1988), Gao (1995), He and Shi (1996), Liang et ah (1999), Hardle et ah (2000), 
He and Liang (2000), and Boente and Rodriguez 2006, among others). Incorporation of 
measurement errors in this setup evolved hrst in nonlinear models (Carroll et ah (2006)) 
and then in nonparametric setups only in the past decade. For nonlinear models one 
may try to mimic the linear model setups with linear or quadratic approximations, but 
again those may call for a second source of non-robustness arising from such possibly 
inadequate approximations. Motivated by this diversity of models and the need for a 
unihed view of such nonstandard or corrupted linear models, the present study mainly 
aims to introduce such corrupted linear models in a more general setup, exhibit the 
supremacy of rank based tests and illustrate its adaptability in some real applications. 

Consider a semiparametric partially linear model where a real response Y is re¬ 
gressed to a set of observable covariates x and further depends on some possibly unob¬ 
servable Z in the form: 

Yi =/3o-f-x7/3 + i^(Zi)-k e*, i = l,...,n, (1.1) 

where the Xj are known (non-stochastic) p-vectors, not all the same, Zj is a stochastic 
g-vector covariate (g > 1), and the form of the function i^(Zj) is unspecihed. Moreover, 
the Zj may be observable, partially observable or unobservable; in the latter case, they 
lead to latent effects models. If the Zj are observable, eventually with measurement 
errors, fll.ip relates to a partially linear and measurement error model, he unknown 
v{.) links fll.ip to the semiparametric model. A big advantage of the rank procedure is 
that it avoids a nonparametric estimation of unknown v{.). The literature recommends 
the functional estimation procedures, using various smoothing tools; but they demand 
smoothness assumptions, while they usually result in slower rates of convergence than 
the rank procedures. We refer to Heckman (1986), Speckman (1988), Chen (1988), 
He and Shi (1996), He and Liang (2000), Bianco et ah (2006), Boente and Rodriguez 
(2006), among other works. The Hardle et ah (2000) monograph is noteworthy in this 
context. 

An alternative approach is a transformation of variables in regression problems 
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which achieves linearity or normality; but this usually sacrihces the homoscedasticity 
condition. The heteroscedastic models and models with measurement errors were in¬ 
tensively treated in the literature; we refer to Fuller (1987), Cheng and van Ness (1999) 
and Carroll et al. (2006), and to additional references cited therein. 

In contrast to the above methods, we put the main emphasis on nonparametric tests 
based on rank statistics. They are valid also for non-normal error distributions, do not 
demand the hnite variances, and their asymptotic forms typically have the standard 
rate of convergence The smoothing techniques as B-splines and kernel smoothing, 

which are commonly used for estimation in the semiparametric linear models, generally 
require a large n and result in a slower rate of convergence than 

The problem of testing the monotonicity of regression was considered by na, who 
used a nonparametric approach in a semiparametric setup. These models can be some¬ 
times reduced to fll.ll) by suitable reformulation. 

We often want to test the null hypothesis of no or partial regression of Y on x, 
treating and z/(-) as nuisance parameters and functions, respectively. The statistical 
interest is then conhned to the hxed-effect parameter /3, regarding z/(-) as a nuisance 
function, similarly as in the [ 8 ] proportional hazard model. More precisely, we want to 
test 

Hq : /3 = 0 vs Hi : /3 7 ^ 0 (1-2) 

with nuisance /So and z/(-). 

Although z/(-) is unspecihed in fll.ip . it is of interest to distinguish two cases accord¬ 
ing as the covariate Z is observable or not. If Z is unobservable, fll.ip corresponds to 
the latent effects model, although in the usual linear model setup, p(Z) is taken to be a 
linear functional, whereas in fll.ip it is unspecihed. If Z^s are observable and regarded 
as identically distributed random variables with some unspecihed distribution and in¬ 
dependent of the error Cj, then letting e* = e* -1- z/(Zj) we may still claim that the e* are 
i.i.d. random variables. However, their distribution function is unlikely to be normal 
even if the Z* were normally distributed; this is specially because of the unspecihed 
nature of z/(-). On the other hand, since the e* are independent identically distributed 
random variables, the classical nonparametric rank based tests are adaptable. This 
naturally suggests that nonparametric tests based on rank statistics would have better 
scope as well as power properties. 

There is a much better perspective if the Zj, though stochastic, are observable. Un¬ 
like the parametric analysis of covariance (ANOCOVA) the assumption of linearity of 
regression is not necessary in the nonparametric ANOCOVA approach. Quade (1969) 
considered a rank ANOCOVA procedure based on the rank sum statistics, and that 
was extended immediately to general scores tests in more general models by Puri and 
Sen (1971) where earlier references are also cited. Even the work of Ghosh and Sen 
(1971) is closely related to this aspect of rank tests. In this context, by virtue of the 
fact that ranks are invariant under any strictly monotone transformation on the covari¬ 
ates, the linearity of the regression on covariate may no longer be necessary, and the 
resulting rank ANOCOVA tests are therefore much more robust than their parametric 
counterparts and typically have greater power than nonparametric rank ANOVA tests 
which ignore the covariates. This improvement comes out of the fact that the joint 
distribution of the coordinatewise rank statistics is typically close to a multinormal one 
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and that validates the use of ANOCOVA tools even when the underlying form of z/(.) 
is nonlinear. Even more, the rank tests are still applicable if the Zj are observable, but 
subject to measurement errors as in the model considered by Nummi and Mottonen 
(2004); then the e* are still i.i.d. random variables, though with some other distribu¬ 
tion function. This shows an advantage of the nonparametric analysis of covariance 
procedures comparing with other methods. 

In a general regression setup where the regressors are stochastic, Ghosh and Sen 
(1971) modihed the usual rank tests for testing the hypothesis of no regression, and 
in the measurement error model, Jureckova et ah (2010) considered suitable rank 
tests. In both the cases, the hypothesis of no regression generates the same invariance 
structure which validates the conventional rank tests. This does not, however, exploit 
the stochastic nature of the regressors to the fullest extent. In the present study, it is 
demonstrated that the incorporation of rank analysis of covariance tools in this more 
complex setup fll.ip yields rank tests which have better performance characteristics. To 
emphasize this enhanced efficiency, extensive numerical studies on simulated as as well 
as a real data set are carried out. Section 2 is devoted to the preliminary notions and 
description of the methods. Section 3 deals with the partially linear model with i.i.d. 
nuisance covariates. Section 4 is devoted to rank analysis of covariance in partially 
linear models, and Sections 5 and 6 provide numerical illustrations, both on simulated 
and real data. 


2 Preliminary notion 

We motivate our statistical models through an interesting case studied by Nummi and 
Mottonen (2004). They described a computer-based forest harvesting technique in 
Scandinavia, where the tree stems are converted into smaller logs and the stem height 
and diameter measurements are taken at hxed intervals. The harvester receives the 
length and diameter data at the Ah stem point from a sensor, and a measuring and 
computing equipment enables a computer-based optimization of crosscutting. Nummi 
and Mottonen (2004) consider the model of regression dependence of the stem diameter 
measurement Ui on the stem height measurement Xi at the Ah stem point, i = 1,..., n. 
The problem of interest is the prediction for yi and the testing of hypotheses on the 
parameters of the model; but both the stem diameter and the stem height contain 
measurement errors. On top of that the volume of the stem may not be linearly related 
to its diameter, rather it is more likely to be related to its height and the cross-section 
which may be roughly proportional to the square of the diameter. 

There are many other similar problems which can be described by partially lin¬ 
ear regression models of the type fll.ljl where x* is a p-vector covariate, Zj is a q- 
vector covariate, the function z/(-) is unknown, and the model error e* is independent 
of (xj, Zj), i = 1,..., n. It means that the response variable 17 depends on variable Xj 
in a linear way but is still related to another independent variables Zj in an unspecihed 
form, i = 1, ... , 77 ,. This model, along with the measurement errors model, are flexible 
and enable to model various situations with latent variables present. 

In fll.lj) we assume that the independent errors ei,..., e„ are identically distributed 
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according to an unknown distribution function F, and that (3^ = (/5i,... ,/dp), (3* = 
(/do, /3^)^ are unknown parameters. The function z/(-) is unknown and Zj are additional 
covariates; if they are unobservable, then all p(Zj), i = are latent random 

variables. The rank tests of for this situation with unobservable Zj are studied in Section 
3. If Zj’s are observable, we can use this additional information even if v{-) remains 
unknown, and apply the methods of the rank analysis of covariance; very important is 
that this method is successful even when Zj itself is affected by a measurement error 
(Section 4). 

Our interest is to hnd how the rank tests of hypothesis Hq in fll.2p behave in the 
described situations and to demonstrate their superiority to other methods. They are 
distribution free and avoid an estimation of nuisance z/(-), which would always worsen 
the rate of convergence of the whole procedure. The numerical study in Section 5 
illustrates the good behavior of the rank tests in situations with various uncertaintes. 

3 Partially linear model with i.i.d. latent variables 

Consider the partially linear model fll.ip and the problem of testing the hypothesis 
Ho ; /3 = 0, with /do ami function i>{-) unknown, the Zj (scalar or vector random 
variables) unobservable. The model can be rewritten as 

Yi = /3o + x7/3 + e*, e* = Cj + p(Zj), i = (3.1) 

The regression matrix X = X„ in model fll.ip is of order n x p with the rows Xi, i = 
1,... ,n. Denote X° the matrix with the rows Xj — x„, i = 1,... ,n, and assume that it 
satishes 

1 1 ^ 

Qn = -XJC^X^ = - V'(xi - x„)(xi - x^)"^ Q as n-)■ oo, (3.2) 
n n 

i=l 

n~^ max {(xj — x„)''"Q“^(xj — x„)} —>-0 as n —)■ oo 

l<i<n ^ 

where Q is a positive dehnite p x p matrix. 

Assume that the distribution function F of the errors Cj has an absolutely continuous 

density / and hnite Fisher information X(/) = dF{z) < oo. Assume that 

Zi,...,Z„ are i.i.d.; let G be the joint distribution function of p(Zj), i = l,...,n. 
It is unknown, we only assume that it has an absolutely continuous density g and 
hnite Fisher information F{g). Moreover, let H denote the distribution function of 
Cj, i = 1,... ,n. Because e* is more dispersed than Cj, then X(h) < X(/), where X(h) 
is the Fisher information of H, with the equality if p(Zj) = 0 with probability 1 (see 
Hajek et al. (1999)). 

Let Ri, ... ,Rn be the ranks of Fj,..., T^. The rank tests of Hq ; f3 = 0 , both in 
models fll.ip and fl3.1l) . are based on the vector of linear rank statistics G 

n 

i=l 


(3.3) 
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where the scores a„(i) are generated by nondecreasing, square integrable score function 
99 : ( 0 , 1 ) HA in either of the following two ways: 

a„(i) = ¥.Lp{Un-.i), (3.4) 

an{i) = (p f = 1,..., n, 

and Un-A < • • • < Un:n are the order statistics corresponding to the sample of size n 
from the i?(0,1) distribution. The test criterion for Hq is the quadratic form in S„, 

7 ;^ = (3.5) 

where 

= [ (At) - ^fdt, p = 

Jo 

and because the ranks are distribution free, its asymptotic null distribution is with 
p degrees of freedom, and the nonlinear regressor does not cause any bias. 

On the other hand, the asymptotic distributions of under the local alternative 

: 13 = f3^ = 0 ^ /3* G RP fixed, (3.7) 

are the noncentral distributions with generally different noncentrality parameters. 
The relative asymptotic efficiency of the test in the presence of the nonlinear covariate 
with respect to that in a genuinely linear model is given in the following theorem: 

Theorem 1 Let 7^ be the test criterion / 1, 9. ,5]) for Ho and TnQ he its special case cor¬ 
responding to P(z/(Z) = 0) = 1. Then 

(i) Under Hq, both and have asymptotically distribution with p degrees of 

freedom, as n ^ 00 . 

(ii) The asymptotic relative efficiency of with respect to T^q under the local alter¬ 

native jg is 



, T -2 ^2 2 ( li.VJO V ( h(H-W))d<p(t) '\ 


(3.8) 


where /, F are the density and distribution function of ei in model U.l\) . h, H 
are the same for el in model US. 1\) . and where 


7((p,h) 


ip(t)ip(t, h)dt, pit, h) 


h'{H-\t)) 
h{H-^{t)) ’ 


(3.9) 


and similarly for ji^p, /). 
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Proof. By Hajek et al. (1999), Sections V.1.5 and V.1.6, we have under Hq as well 
as under 

||Q"^/^[S„ - L„]|| = Op(l) as rn-oo (3.10) 

where 

n 

Ln = X](xi - x„)(y9(i7(Ti)) 

i=\ 

here Uni-, ■ ■ •, Unn are the random samples from the uniform (0,1) distribution. Hence, 
both Tn and Tno are asymptotically distributed with p degrees of freedom under 
Hq. Under H„, the asymptotic distribution of is the noncentral with p degrees 
of freedom and with the noncentrality parameter 

AH = (3*^Qf3* (3.11) 

while H = F if y{Z) = 0 with probability 1. This yields fl3.8p as the relative asymptotic 
efficiency (ARE) of the test with respect to the test Tno- n 


For the special case of Wilcoxon scores, it follows that e(Tn,T^o) < 1, with the 
equality sign holding when z/(Z) = 0 with probability 1. Similar inequality holds for 
the median test, if / and g [density of p(Z)] are symmetric around 0 and / is unimodal, 
because then 'y{ip,h) = h(0) < /(O) = 'y{ip,f), with the equality sign holding when 
p(Z) = 0 with probability 1. For general scores, under star-shaped ordering of / and h 
(Doksum (1969), Bickel and Lehmann (1979)), it follows that eij'n, T^q) < 1- If the test 
with score function ip is asymptotically optimal for /, i.e. if pit) = p{t,f), 0 <t < 1, 
then e{Tn, T^q) < < 1. In the general case. 


e(r„^ T^o) < 




It may be of interest whether there is a positive lower bound to fl3.8p . However, 
allowing the dispersion of p(Z) to be large compared to that of e, it can be shown 
that under the same conditions as in above, fl3.8|) can be made arbitrarily close to 0. 
Thus, too much of latent effects can affect the efficacy of rank tests; it is similar in the 
parametric case if I® large; then the latent-effects model lose the efficacy. 

Besides the presence of a nonlinear nuisance regressor, the Tj can 1^further affected 
by an additive measurement error. Hence, instead of T) we observe hU = 17 + U, f = 
1 ,..., n, where the random errors Vi,..., 14 are assumed to be i.i.d. and independent 
of Tj, Xj, Zj, i = 1,... ,n. Their distribution (say G) is unknown, we only assume 
that it has an absolutely continuous density g. Then the model fll.ip can be further 
rewritten in the form 


Wi = xj/3 + 4, = Ci + p(Zj) + Vi, i = 1,... ,n. 

Let Ri,... ,Rn denote the ranks of ITi,..., H4. Under Hq, they are independent and 
identically distributed, hence 

p(^{Ri,...,Rn) = (ri,...,r„)) = ^ 
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for every permutation (ri,..., r„) of 1,..., n. The test of Hq is then based on vector 
of linear rank statistics 


i=l 

The test criterion for Hq is the quadratic form in S„, 

7? = (A(^)r^ 


(3.12) 


(3.13) 


Because is distribution free under Hq, the test based on has the same null 
distribution as the one based on and their common distribution depends on the 
matrix Q„. Hence, their asymptotic null distributions are the same, and as such, they 
have the same critical region, which asymptotically can be approximated by the right 
hand tail of the distribution with p degrees of freedom. Its asymptotic distribution 
under the local alternative fl3.7p is noncentral with p degrees of freedom and the 
noncentrality parameter 


Ap = 13*^ Ql3* 


A^{p) 


where H is the distribution function of e* = e* + p(Zi) + Vi. 


4 Rank analysis of covariance in partially linear 
models 

Consider the model fll.ip as a partially linear model with possible measurement errors. 
If the Yi are observed only with measurement errors, then these errors can be absorbed 
in the errors e* of the model. More important is when the covariates Zj are observed 
only with errors, hence we only observe W, = Zj + r/^, i = 1,... ,n. Hence, model fll.ll) 
can be rewritten in the form 

= /^o + x7/3 + e** (4.1) 

Cj* = Cj + z/(Wi), Wj = Zi + ? 7 j, l<i<n, 

where Yi, x* and W, are all observable, but Wj and e** may no longer be independent. 
Information on this dependence is recovered through the ra nk analysis of covariance 
approach, whose invariance structure enables to prevail this dependence, and even 
enhaces the power of the test of Hq. A semiparametric approach estimating p(W) 
nonparametrically, using a suitable smoothing tool, possibly leads to a slower rate of 
convergence; inference on (3 is then made in a parametric way. 

Let be the rank of Wij among Wij, ..., Wnj, 1 < i < n; 1 < j < q. De¬ 

note Wj = (ITji,..., WiqY, 1 < i < n. Moreover, let i?® be the rank of Y^ among 
Hi,..., Yn, 1 <i <n. Denote 




(4.2) 
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the (g + 1) X n rank collection matrix, where 

(fii?.-Hi?. ■■■.«“) h !<*<«. 

Recall that under Hq ; /3 = 0 are i = independent identically 

distributed (g + l)-vectors, while Tj and Wj are not necessarily independent. Denote 
G*(u), u G the distribution function of (Tj, Wj)^. 

Dehne a set of (g + 1) scores a„j(/c), 1 < fc < n for j = 0,1,..., g, in the same 
manner as in Section 3. For the notational simplicity, we may take anj{k) = an{k), 0 < 
j < q, k = 1,... ,n. Dehne the random p-vectors 

Tni = - x„)a„ , 0 < j < g. 


Dehne the matrix of order (g + 1) x (g + 1) with the components 


Vnj£ 


1 ^ ^ 

^ ^ ^ ^ (^^ni ^ ~ (^Cbn (^R-ni ^ ~ , j, 

i=l 


0 ,1 ,..., g. 


Under Hq : /3 = 0, the n columns of in 114.2p are interchangeable with the common 
permutational (conditional, given the set of n\ possible realizations of M„) probability 
Denoting this permutation measure Vn, we have 

EvJTnj = 0, E'p^{TnjTniy = Vnji^n for j, £ = 0,1 ,..., g 


with Q„ being the matrix dehned in fl3.2l) . Decompose the matrix V„ in the form 


V 


n 


TjiOO v([o 


(4.3) 


and put 


where 


TnOO.l — TnOO 


.T ^r-l 

Co 




Tn0;l — T„0 - (T*) 


t: = (T^ 


• nqJ 


(4.4) 


Thus, T„o:i is the vector of residual rank statistics of YiS in the regression of T„o on 
T*. Note that 


Evr,Tno-i — 0 , 

(Tn,0:lT^Q.j^) VnOO.lQn- 


This suggests the test criterion 


£ = 


TnOO.l 


(T„0:lQn T„0:l) 
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which can be further rewritten as 


L 


0 

n 


Cn-C 


* 

n 


where 

= TlQ-^ ® V-iT„, Cl = ^ (TJoQ-iT„o) • 

^nOO 

Regarding the rank permutation distribution Vn described above, we conclude that the 
critical region of can be obtained by enumerating the n! possible permuted values 
of and the corresponding values of Due to the permutation invariance of the 
pertaining components, is permutationally distribution-free [permutation principle of 
Chatterjee and Sen (1964)]. Asymptotically, as n —)■ oo, the permutational distribution 
of can be approximated by the distribution with p degrees of freedom. 

Under the local alternative fl3.7lh 


V. 


n-tco, 


(4.5) 


the limiting rank score covariance matrix. Decompose F analogously as in fl4.3p . 


7oo 

1 O 

_1 


(4.6) 

. ^0 

i 11 J 



Too - 

Tor^So- 

(4.7) 


and put 


Note that the distribution of T* does not depend on fl3.7p [as the Zj are i.i.d], and hence 
under fl3.7p the shifts of T„o:i and of T„o coincide. Thus, under the local alternative 
(1^ 

(4,8) 


' Ap.A 




the noncentral with p degrees of freedom and with noncentrality parameter 


A% = /3-TQ/3 


700.1 


with 7oo.i defined in fl4.7p . This further implies 

7oo.i < 7oo = 44^(7^), 


(4.9) 


where the equality sign holds only when Xo ~ ^5 hence cannot be smaller than 
the noncentrality parameter Ah (see fl3.1ip i of the analysis of variance rank test with 
the same score function. The asymptotic relative efficiency (ARE) of the analysis of 
covariance rank test relative to the analysis of variance rank test, based on the same 
score function (p(.) is given by 

ARE (ANOCOVA vs. AN OVA) = ^ > 1 ; (4.10) 

7oo.i 
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hence, the analysis of covariance test is always at least as efficient as the analysis of 
variance test. 

Summarizing, we conclude that the standard rank tests of linear hypothesis can be 
used even in the presence of a nonlinear nuisance regression or if there are measurement 
errors in the response or in the regressor, provided all these entities are i.i.d. and 
independent of each others and of the model errors. If we use the test while ignoring 
these disturbances, the probability of the error of the hrst kind is unchanged, while the 
disturbances only affect the power. If the nuisance regressors are observable, using the 
rank analysis of covariance still enhances the power. 

4.1 Mixed linear model 

Consider the mixed model 


— /^o + ^7/3 + TiJ 7 + Cj, 


(4.11) 


where Zj, i = 1,... ,n are stochastic g-vectors and 7 is an unknown parameter. The 
mixed linear models with random and nonrandom covariates were studied in mono¬ 
graphs by Khuri et ah (1998) and by Muller and Stewart (2006); the hrst one has a 
more theoretical havor, while the second one focuses on detailed applications. However, 
if the random covariate is observed with an error, even the mixed linear model leads to 
the form fll.ip with a nonlinear nuisance regressor. Assume that the Zj are not directly 
observable, but are subject to measurement errors r/^, hence the observable random vec¬ 
tors are Wj = Zj + r}^. Assume that the are independent of both Zj and Cj. Without 
loss of generality assume that the EWi = 0. Notice that Tj and Wj are independent, 
given Zj. Hence, the conditional distribution function of Yf = Yi — f3 given Wj, 

denoted by /(yo|w)(i/°|w), can be written as 




/(!/»|w,.|(!/|w, z)/(z|w)(z|w)(iz 


(4.12) 



If the two conditional densities are Gaussian, then fld.lip corresponds to the linear mea¬ 
surement error model, with 7 replaced by K^ 7 , where K is the matrix of Sz(Sw)~^. 
However, if the two densities are not Gaussian, then fl4.12p involves a nuisance func¬ 
tion z/(Wj), where the form of z/(.) is unspecihed, depending on the unknown densities. 
Note that here Wj are observable, not Zj, hence we will have z/(Wj) instead of p(Zj) 
in fll.ip . Hence note that even for the mixed linear model fl4.1ip if the densities are 
not all Gaussian, we may not have a linear model, but based on fl4.12p . we can adapt a 
partially linear model as in fll.ip . This enables us to incorporate rank analysis of covari¬ 
ance tests to have better power properties. There is, however, one salient point that we 
need to emphasize here. Since the rank ANOGOVA test is conditionally (permutation- 
ally) distribution-free, for small to moderate sample sizes the permutation distribution 
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needs to be enumerated to compute the permutational critical values. This task is quite 
manageable for small sample sizes but becomes prohibitively laborious as the sample 
sizes increase. Though for large samples, asymptotics work out well, for moderate to 
small sample sizes, to aid permutation distribution enumeration, classical resampling 
tools (such as jackknife or bootstrap methods) can be used. We refer to the next two 
sections for these rehnements. 


5 Numerical illustrations 

In order to illustrate the proposed procedures for hnite sample situation, we have 
conducted a simulation study. 

We considered three semiparametric partially linear models 

Yi = (3 q + Xi(3i + Wi7 + Ci, i = 1,..., n, (5.1) 

Yi = (3 q + Xi(3i + Wi6 + Ci, i = 1,.. .,n, (5.2) 

Yi = /3o + Xi(3i + sm{wi) + e*, i = (5.3) 

with Wi = Zi+rji, 1 < i < P, where r/j, 1 < z < n are measurement errors. The errors 
Ci, i = 1,... ,n, were simulated from the normal A^(0,1), Laplace L(0,1) and Cauchy 
distributions, respectively. The measurement errors r/j, ^ Y i < n, were generated 
independently from the normal A^(0,0.7), iV(0,2) and uniform U{—1, 1) distributions. 

The design points Xi,... ,Xn were generated from the uniform distribution on the 
interval (-2,10) and zi,...,Z 2 from the uniform distribution on the interval (-10,30). 
They remain hxed for all simulations under given n. 

The following parameter values of models were used: 

• sample sizes: n = 20, 100, 500; 

• /^o = 1; 

• (3^ = -0.5,-0.4,..., 0,..., 0.4, 0.5; 

• 7 = 3; 

• 6 =- 2 . 

Our interest is testing the hypothesis H : /3i = 0 against alternative K : /3i ^ 0. 
We use the test criterions 7^ in fl3.13|) and in fl4.8|) . 10 000 replications of the 
models were simulated for each combination of the parameters and each distribution 
of measurement errors, and the test criterions were then computed for the Wilcoxon 
scores. The level a = 0.05 test was performed every time, the mean power of the 
pertaining tests was then calculated. Figures 1-3 compare the powers in model (15.21) 
with standard normal distribution of errors e*, i = 1,... ,n, for various sample sizes. 
We can see that results for small n, i.e. n = 20, are not overly good, but the results 
are much better for larger sample sizes. Comparing Figures 3 and 4 shows an effect 
of the distribution of errors e*, z = 1,..., u in model (15.2p with n = 500. Figures 3, 
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5 and 6 compare the powers for different models, i.e. for fl5.2p . fIS.ip and fl5.3p . for 
sample size n = 500. Figure 7 compares the empirical powers based on and in 
the models fl5.2p and 05.31) for single size n = 500. Superimposing the power of the 
analysis of covariance rank test on the same for the analysis of variance rank test, we 
see that the analysis of covariance test performs better than the analysis of variance 
test in all cases; more prominently for large sample sizes and when the measurement 
error variance is not small compared to the error variance of the e*. This is perfectly 
in line with our theoretical claim in fld.lOp . When the measurement error variance is 
small, the rank covariance pqi is likely to be small too, and hence, this supremacy of 
the analysis of covariance test to the analysis of variance test is less perceptible for n 
= 20 (see Fig. 1 and 2). The picture becomes more pronounced for larger sample sizes 
(Fig. 4-7). 

We have made more extensive simulation experiments. Particularly, various score 
functions, design vectors, other underlying distributions of the error terms and the 
measurement errors with small variance were considered. The results were very good 
for larger sample sizes, similar to Figures 3-6. Naturally, the results are considerably 
affected by the distributions of the error terms, but on the other hand, the influence of 
the measurement errors with small variances and of the function v of the covariate z 
is not so substantial. Here, too, the analysis of covariance tests give better results. 


6 Application to the precipitation dataset 

The test described above is applied to a datasets of 1-day precipitation amounts. This 
application makes use outputs of coupled atmosphere and ocean general circulation 
models of the NOAA Geophysical Fluid Dynamics Laboratory. The outputs with the 
daily resolution are available in the form of transient climate change simulations carried 
out under increasing greenhouse gas concentrations according to prescribed emission 
scenarios over 1961-2100. Models have a horizontal resolution 2.5 x 2.0° (longitude x 
latitude) for South America. 

A variable of primary interest Y (precipitation) is modeled using additional covari¬ 
ates: the time index x and the southern oscillation index Z, which is calculated from 
the monthly or seasonal fluctuations in the air pressure difference between Tahiti and 
Darwin. The model under consideration has the form: 

Yi = (3o + XiPi Y u{Zi) Yd, i = 1,.. .,n, 

where the Zj are observable but probably with measurement errors, and z/(.) is un¬ 
known. 

For each scenario gridpoint we tested the signihcance of time index, i.e. H : f3i = 0 
against alternative K : f3i ^ 0. Table 6.1 summarizes results of testing for all 888 
gridpoints and three scenarios. 


Table 6.1. Rejection and non-rejection of the null hypothesis at level a = 0.05 
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Figure 1: Empirical power of the Wilcoxon test based on £°(top) and (bottom) 
for n = 20 in the model fl5.2p under the standard normal errors e*, i = 

Solid line corresponds to the standard test, i.e. Wi = Zi, i = 1,... ,n. The situations 
where zi,..., are affected by random errors are denoted by the dashed line (normal 
distribution A/'[0,0.7]), the dotted line (normal distribution A/'[0,2]) and dotdash line 
(uniform t/[—1,1]) 
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Figure 2: Empirical power of the Wilcoxon test based on £°(top) and (bottom) 
for n = 100 in the model fl5.2p under the standard normal errors e*, i = l,...,n. 
Solid line corresponds to the standard test, i.e. Wi = Zi, i = 1,... ,n. The situations 
where zi,... ,Zn are affected by random errors are denoted by the dashed line (normal 
distribution A/'[0,0.7]), the dotted line (normal distribution A/'[0,2]) and dotdash line 
(uniform f/[—1,1]) 
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Figure 3: Empirical power of the Wilcoxon test based on £°(top) and (bottom) 
for n = 500 in the model fl5.2p under the standard normal errors e*, i = 

Solid line corresponds to the standard test, i.e. Wi = Zi, i = 1,... ,n. The situations 
where zi,... ,Zn are affected by random errors are denoted by the dashed line (normal 
distribution A/'[0,0.7]), the dotted line (normal distribution A/'[0,2]) and dotdash line 
(uniform f/[—1,1]) 
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Figure 4: Empirical power of the Wilcoxon test based on £°(top) and (bottom) 
for n = 500 in the model fl5.2p under the Cauchy distributed errors e*, i = 1,... ,n. 
Solid line corresponds to the standard test, i.e. Wi = Zi, i = 1,... ,n. The situations 
where zi,... ,Zn are affected by random errors are denoted by the dashed line (normal 
distribution A/'[0,0.7]), the dotted line (normal distribution A/'[0,2]) and dotdash line 
(uniform U[—l, 1]) 
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Figure 5: Empirical power of the Wilcoxon test based on £°(top) and (bottom) 
for n = 500 in the model fl5.ip under the standard normal errors e*, i = 

Solid line corresponds to the standard test, i.e. Wi = Zi, i = 1,... ,n. The situations 
where zi,... ,Zn are affected by random errors are denoted by the dashed line (normal 
distribution A/'[0,0.7]), the dotted line (normal distribution A/'[0,2]) and dotdash line 
(uniform f/[—1,1]) 
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Figure 6: Empirical power of the Wilcoxon test based on £°(top) and (bottom) 
for n = 500 in the model fl5.3p under the standard normal errors e*, i = 

Solid line corresponds to the standard test, i.e. Wi = Zi, i = 1,... ,n. The situations 
where zi,... ,Zn are affected by random errors are denoted by the dashed line (normal 
distribution A/'[0,0.7]), the dotted line (normal distribution A/'[0,2]) and dotdash line 
(uniform f/[—1,1]) 
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Figure 7: Comparison of empirical power based on (solid line) and (dotted line) 
for n = 500 in the models 05.21) (top) and 05.3p (bottom) under the standard normal 
errors e*, i = 1,...,n. The covariates zi,... ,Zn are affected by random errors coming 
from normal distribution AAp, 0.7] 
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scenario 

]] of rejection H 

of non-rejection of H 

scenario 1 (m21af) 

465 

423 

scenario 2 (m21a2) 

576 

312 

scenario 3 (m21bl 

598 

290 
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